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Abstract 

The asymmetric Hopfield model is used to simulate signaling dynamics in gene regulatory networks. The model allows for a 
direct mapping of a gene expression pattern into attractor states. We analyze different control strategies aimed at 
disrupting attractor patterns using selective local fields representing therapeutic interventions. The control strategies are 
based on the identification of signaling bottlenecks, which are single nodes or strongly connected clusters of nodes that 
have a large impact on the signaling. We provide a theorem with bounds on the minimum number of nodes that guarantee 
control of bottlenecks consisting of strongly connected components. The control strategies are applied to the identification 
of sets of proteins that, when inhibited, selectively disrupt the signaling of cancer cells while preserving the signaling of 
normal cells. We use an experimentally validated non-specific and an algorithmically-assembled specific B cell gene 
regulatory network reconstructed from gene expression data to model cancer signaling in lung and B cells, respectively. 
Among the potential targets identified here are TP53, FOXIVll, BCL6 and SRC. This model could help in the rational design of 
novel robust therapeutic interventions based on our increasing knowledge of complex gene signaling networks. 
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Introduction 

The vision behind systems biology is that complex interactions 
and emergent properties determine the behavior of biological 
systems. Many theoretical tools developed in the framework of 
spin glass models are well suited to describe emergent properties, 
and their application to large biological networks represents an 
approach that goes beyond pinpointing the behavior of a few 
genes or metabolites in a pathway. The Hopfield model [1] is a 
spin glass model that was introduced to describe neural networks, 
and that is solvable using mean field theory [2]. The asymmetric 
case, in which the interaction between the spins can be seen as 
directed, can also be exacty solved in some limits [3] . The model 
belongs to the class of attractor neural networks, in which the spins 
evolve towards stored attractor patterns, and it has been used to 
model biological processes of high current interest, such as the 
reprogramming of pluripotent stem cells [4]. Moreover, it has 
been suggested that a biological system in a chronic or therapy- 
resistant disease state can be seen as a network that has become 
trapped in a pathological Hopfield attractor [5] . A similar class of 
models is represented by Random Boolean Networks [6], which 
were proposed by Kauffinan to describe gene regulation and 
expression states in cells [7]. Differences and similarities between 



the Kauffman-type and Hopfield-type random networks have 
been studied for many years [8-1 1]. 

In this paper, we consider an asymmetric Hopfield model buUt 
from real (even if incomplete [12,13]) cellular networks, and we 
map the spin attractor states to gene expression data from normal 
and cancer cells. We will focus on the question of controling of a 
network's final state (after a transient period) using external local 
fields representing therapeutic interventions. To a major extent, 
the final determinant of cellular phenotype is the expression and 
activity pattern of all proteins within the cell, which is related to 
levels of mRNA transcripts. Microarrays measure genome-wide 
levels of mRNA expression that therefore can be considered a 
rough snapshot of the state of the cell. This state is relatively stable, 
reproducible, unique to cell types, and can differentiate cancer 
cells from normal cells, as well as differentiate between different 
types of cancer [14,15]. In fact, there is evidence that attractors 
exist in gene expression states, and that these attractors can be 
reached by different trajectories rather than only by a single 
transcriptional program [16]. While the dynamical attractors 
paradigm has been originally proposed in the context of cellular 
developement, the similarity between cellular ontogenesis, i.e. the 
developement of different cell types, and oncogenesis, i.e. the 
process under which normal cells are transformed into cancer 
cells, has been recently emphasized [17]. The main hypothesis of 
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this paper is that cancer robustness is rooted in the dynamical 
robustness of signahng in an underlying cellular network. If the 
cancerous state of rapid, uncontrolled growth is an attractor state 

of the system [18], a goal of modeling therapeutic control could be 
to design complex therapeutic interventions based on drug 
combinations [19] that push the cell out of the cancer attractor 
basin [20]. 

Many authors have discussed the control of biological signaling 
networks using complex external perturbations. Calzolari and 
coworkers considered the effect of complex external signals on 
apoptosis signaling [21]. Agoston and coworkers [22] suggested 
that perturbing a complex biological network with partial 
inhibition of many targets could be more effective than the 
complete inhibition of a single target, and explicitly discussed the 
implications for multi-drug therapies [23]. In the traditional 
approach to control theory [24] , the control of a dynamical system 
consists in finding the specific input temporal sequence required to 
drive the system to a desired output. This approach has been 
discussed in the context of Kauffmann Boolean networks [25] and 
their attractor states [26]. Several studies have focused on the 
intrinsic global properties of control and hierarchical organization 
in biological networks [27,28]. A recent study has focused on the 
minimum number of nodes that needs to be addressed to achieve 
the complete control of a network [29]. This study used a linear 
control framework, a matching algorithm [30] to find the 
minimum number of controllers, and a replica method to provide 
an analytic formulation consistent with the numerical study. 
Finally, Cornelius et al. [31] discussed how nonlinearity in network 
signaling allows reprogrammig a system to a desired attractor state 
even in the presence of contraints in the nodes that can be 
accessed by external control. This novel concept was explicitly 
applied to a T-cell survival signaling network to identify potential 
drug targ(;ts in T-LGL leukemia. The approach in the present 
paper is based on nonlinear signaling rules and takes advantage of 
some useful properties of the Hopfield formulation. In particular, 
by considering two attractor states we will show that the network 
separates into two t^pes of domains which do not interact with 
each other. Moreover, the Hopfield framework allows for a direct 
mapping of a gene expression pattern into an attractor state of the 
signaling dynamics, facilitating the integration of genomic data in 
the modeling. 

The paper is structured as follows. In Mathematical Model we 
summarize the model and review some of its key properties. 
Control Strategies describes general strategies aiming at selectively 
disrupting the signaling only in cells that are near a cancer 
attractor state. The strategies we have investigated use the concept 
of bottlenecks, which identify single nodes or strongly connected 
clusters of nodes that have a large impact on the signaling. In this 
section we also provide a theorem with bounds on the minimum 
number of nodes that guarantee control of a botdeneck consisting 
of a strongly connected component. This theorem is useful for 
practical applications since it helps to establish whether an 
exhaustive search for such minimal set of nodes is practical. In 
Cancer Signaling we apply the methods from Control Strategies to 
lung and B cell cancers. We use two different networks for this 
analysis. The first is an experimentally validated and non-specific 
network (that is, the observed interactions are compiled from 
many experiments conducted on heterogeneous cell types) 
obtained from a kinase interactome and phospho-protein database 
[32] combined with a database of interactions between transcrip- 
tion factors and their target genes [33]. The second network is cell- 



specific and was obtained using network reconstruction algorithms 
and transcriptional and post-translational data from mature 
human B cells [34]. The algorithmically reconstructed network 

is significandy more dense than the experimental one, and the 
same control strategies produce different results in the two cases. 
Finally, we close with Conclusions. 

Methods 

Mathematical Model 

We define the adjacency matrix of a network G composed of 
nodes as 



1 ify- 



Aij-- , 

' 0 otherwise 



(1) 



where j->-i denotes a directed edge from node j to node i. The set 
of nodes in the network G is indicated by V(G) and the set of 

directed edges is indicated by E(G) = {(J,i) : j^i}. (See Table 1 
for a list of mathematical symbols used in the text.) The spin of 
node i at time t is ff,(;) = ± 1, and indicates an expresssed (-1- 1) or 
not expressed (—1) gene. We encode an arbitrary attractor state 
(J = (Cl,C2v>iAr) with ^, = + 1 by defining the coupling matrix [1] 



Jy — ^ijiiij- 



(2) 



The total field at node i is then hi = hf^ + JijOj, where hf^ is 
the external field apphed to node which will be discussed below. 
The discrete-time update scheme is defined as 



(Ti{t + M)-- 



+ 1 with prob. {\+&i^[-hi{t)IT]y 
- 1 with prob. (1 -l-exp[-|-;!,(0/7'])" 



(3) 



where 7'>0 is an effective temperature. For the remainder of the 
paper, we consider the case of T = 0 so that (7, = sign(/!,), and the 
spin is chosen randomly from +1 if /!, =0. For convenience, we 
take feZ and Ar=l. Nodes can be updated synchronously, and 
synchronous updating can lead to limit cycles [9] . Nodes can also 
be updated separately and in random order (anynchronous 
updating), which does not result in limit cycles. All results 
presented in this paper use the synchronous update scheme. 

Source nodes (nodes with zero indegree) are fixed to their initial 
states by a small external field so that (7^(f) = Cj(0) for all q^Q, 
where Q is the set of source nodes. However, the source nodes flip 
if directly targeted by an external field. Biologically, genes at the 
"top" of a network are assumed to be controlled by elements 
outside of the network. 

In appUcation, two attractors are needed. Define these states as 
and (J'^, the normal stale and cancer state, respectively. The 
magnetization along attractor state a is 



^ i=l 



(4) 



Note that if ^"(0= ± 1, ff(0= We also defme the steady 
state magnetization along state a as 
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Table 1. Reference table for symbols. 



Symbol 


Explanation 


Q 




N 


Number of nodes 




Adjacency matrix 


V(G) 


Set of nodes in G 


E(G) 


Set of edges in G 




Outdegree/indegree of node / 




Spin of node /, = + 1 


s° 


attractor 


C 


Normal/cancer attractor 


Jii 


Coupling matrix 


fii 


Total field at node / 




External field applied to node z 


T 


Temperature 


Q 


Set of source and effective source nodes 


m'it) 


Magnetization along attractor a at time t 


"4 


Steady-state magnetization along attractor a 


P 


Number of attractors in coupling matrix 


S 


Set of similarity nodes 


D 


Set of differential nodes 


L(B) 


Control set of bottleneck B 


I{B) 


Impact of bottleneck B 


C 


Cycle cluster 


B 


Size k bottleneck, where k = \B\ 


Z(B,G) 


Set of critical nodes for bottleneck B in network G 


ncr,t(B,G) 


Critical number of nodes in bottleneck B in network G 


R{C,G) 


Set of externally influenced nodes 


W{C,G) 


Set of intruder connections 


Z„dC,G) 


Reduced set of critical nodes 




Minimum indegree of all nodes in a cycle cluster 




Critical efficiency of bottleneck B 


eoft(B) 


Optimal efficiency of bottleneck B 



This table lists all important symbols introduced in the article with a brief 

explanation of its purpose. 

doi:10.1371/journal.pone.0105842.t001 



(5) 



There are two ways to model normal and cancer cells. One way 
is to simply define a different coupling matrix for each attractor 
state a, 



(6) 



Alternatively, both attractor states can be encoded in the same 
coupling matrix, 



(7) 



Systems using Eqs. 6 and 7 will be referred to as the one 
attractor state (p=l) and two attractor state {p = 2) systems, 
respectively. Eqs. 6 and 7 are particular cases of the general 
Hopfield form [1] 



(8) 



where p is the number of attractor states, often taken to be large. 
An interesting property emerges when p = 2, however. Consider a 
simple network composed of two nodes, with only one edge 1^2 
with attractor states ^" and i"^, and T = 0. The only nonzero entry 



of the matrix Jy is 



Note that if i" = 
have 



/2i=ClC^ + c5cJ. (9) 
•^2i=2{2Cp In either case, by Eq. 3 we 



-ci if<7i(o=-c'r 



(10) 



that is, the spin of node 2 at a given time step will be driven to 
match the attractor state of node 1 at the previous time step. 
However, if c" = + and = + £2, /21 =0. This gives 



(72(0 = 



1 with probability I/2 
■ 1 with probability ^2 



(11) 



In this case, node 2 receives no input from node 1 . Nodes 1 and 2 
have become effectively disconnected. 

This motivates new designations for node types. We define 
similarity nodes as nodes with £" = cj', and differential nodes 
as nodes with c" = — C^. We also define the set of similarity 
nodes S={i : £" = d} and the set of differential nodes D = {i : £" 
= — (^J^}. Connections between two similarity nodes or two 
differential nodes remain in the network, whereas connections 
that link nodes of different types transmit no signals. The effective 
deletion of edges between nodes means that the original network 
fully separates into two subnetworks: one composed entirely of 
similarity nodes (the similarity network) and another composed 
entirely of differential nodes (the differential network), each of 
which can be composed of one or more separate weakly connected 
components (see Fig. 1). With this separation, new source nodes 
(effective sources) can be exposed in both the similarity and 
differential networks. For the remainder of this article, Q is the set 
of both source and effective source nodes in a given network. 

Control Strategies 

The strategies presented below focus on selecting the best single 
nodes or small clusters of nodes to control, ranked by how much 
they individually change m'!^ . In application, however, controlling 
many nodes is necessary to achieve a suflficiendy changed wi^. 
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The effects of controlhng a set of nodes can be more than the sum 
of the effects of controUing individual nodes, and predicting the 
truly optimal set of nodes to target is computationally difficult. 
Here, we discuss heuristic strategies for controlling large networks 
where the combinatorial approach is impractical. 

For both p= \ and p = 2, simulating a cancer cell means that 

5(0)= +^', and likewise for normal cells. Although the normal 
and cancer states are mathematically interchangeable, biologically 
we seek to decrease m^, as much as possible while leaving 
« + 1. By "network control" we thus mean driving the system 
away from its initial state of ff(0) = <^'^ with A*^"'. Controlling 
individual nodes is achieved by applying a strong field (stronger 
than the magnitude of the field due to the node's upstream 
neighbors) to a set of targeted nodes T so that 



0 else 



(12) 



This ensures that the drug field can always overcome the field 
from neighboring nodes. 

In application, similarity nodes are never deliberately directly 
targeted, since changing their state would adversely affect both 
normal and cancer cells. Roughly 70% of the nodes in the 
networks surveyed are similarity nodes, so the search space is 
reduced. For p = 2, the effective edge deletion means that only the 
differential network in cancer cells needs to be simulated to 

determine the effectiveness oi h"^^. For p=l, however, there may 
be some similarity nodes that receive signals from upstream 
differential nodes. In this case, the full effect of h"'^^ can be 
determined only by simulating all differential nodes as well as any 
similarity nodes downstream of differential nodes. All following 
discussion assumes that all nodes examined are differential, and 
therefore targetable, for both p=l and p = 2. The existence of 
similarity nodes for/j= 1 only limits the set of targetable nodes. 



Directed acyclic networks. Full control of a directed acyclic 
network is achieved by forcing (T^=— cj^ for all qeQ. This 
guarantees m^ = — 1. Suppose that nodes geQ in an acyclic 
network have always been fixed away from the cancer state, that is, 
iT^(r-> — co)= — (f^. For any node i to have f7,(?) = £'', it is 
sufficient to have either ieQ or aj(t—l) = ^J for all j—^i, i^Q. 
Because there are no cycles present, all upstream paths of sufficent 
length terminate at a source. Because the spin of all nodes qeQ 
point away from the cancer attractor state, ail nodes downstream 
must also point away from the cancer attractor state. Thus, for 
acyclic networks, forcing cr^ = — £^ guarantees m^, = — 1 . The 
complications that arise from cycles are discussed in the next 
subsubsection. However, controlling nodes in Q may not be the 
most efficient way to push the system away from the cancer basin 
of attraction and, depending on the control limitations, it may not 
be possible. If minimizing the number of controllers is required, 
searching for the most important botdenecks is a better strategy. 

Consider a directed network G and an initially identical copy, 
G' = G. If removing node / (and ail connections to and from i) 
from G' decreases the indegree of at least one node jeV{G'),j^i, 
to less than half of its indegree in network G, {i} is a size 1 
bottleneck. The bottleneck control set of bottleneck {i}, L(i), is 
defined algorithmically as follows: (1) Begin a set L{i) with die 
current bottleneck i so that L = {i}; (2) Remove botdeneck {/} 
from network G'; (3) Append L{i) with all nodes j with current 
indegree that is less than half of that from the original network G; 
(4) Remove all nodes j from the network G' . If additional nodes in 
G' have their indegree reduced to below half of their indegree in 
G, go to step 3. Otherwise, stop. The impact of the bottleneck i, 
is defined as 



(13) 



where \X\ is the cardinality of the set X. The impact of a 
bottleneck is the minimum number of nodes that are guaranteed 
to switch away from the cancer state when the bottleneck is forced 
away from the cancer state. 





Similarity node 
Differential node 



Figure 1. Network segregation for two attractor states (/) = 2). Every edge that connects a similarity node to a differential node or a differential 
node to a similarity node transmits no signal. This means that the signaling in the right network shown above is identical to that of the left network. 
Because the goal is to leave normal cells unaltered while damaging cancer cells as much as possible, all similarity nodes can be safely ignored, and 
searches and simulations only need to be done on the differential subnetwork. 
doi:1 0.1 371/journal.pone.01 05842.g001 
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The impact is used to rank the size 1 bottlenecks by importance, 
with the most important as those with the largest impact. In 
application, when searching for nodes to control, any size 1 
bottleneck {/} that appears in the bottleneck control set of a 
different size 1 botdeneck {/} can be ignored, since fixing j to the 
normal state frxes / to the normal state as well. Note that the 
definition given above in terms of G and G avoids miscounting in 
the impact of a botdeneck. 

The network in Fig. 2, for example, has three sources (nodes 1, 
2 and 3), but one important bottleneck (node 6). If full damage, i.e. 
m^ = — 1, is required, then control of all source nodes is 
necessary. If minimizing the number of directiy targeted nodes is 
important and nf^ > — 1 can be tolerated, then control of the 
bottleneck node 6 is a better choice. 

Directed cycle-rich networks. Not all networks can be fully 
controlled at r = 0 by controlling the source nodes, however. If 
there is a cycle present, paths of infinite length exist and the final 
state of the system may depend on the initial state, causing parts of 
the net\vork to be hysteretic. Controlling only sources in a general 
directed network thus does not guarantee nf^^ = — 1 unless the 
system begins with (7, = — <^^. 

Define a cycle cluster, C, as a strongly connected subnetwork of 
a network G. The network in Fig. 3, for example, has one cycle 
cluster with nodes K(C) = {4,5,6,7}. If the network begins with 
ct(0) = ^' , forcing both source nodes away from the cancer state 
does nothing to the nodes downsteam of node 3 (see Fig. 4). This is 
because the indegree deg^(4) = 4, and a majority of the nodes 
connecting to node 4 are in the cancer attractor state. At 7^ = 0, 
cycle clusters with high connectivity tend to block incoming signals 
from outside of the cluster, resulting in an insurmountable 
activation barrier. 

The most effective single node to control in this network is any 
one of nodes 4, 6 or 7. Forcing any of these away from the cancer 
attractor state will eventually cause the entire cycle cluster to flip 
away from the cancer state, and all nodes downstream will flip as 
well, as shown in Fig. 4. The cycle cluster here acts as a sort of 
large, hysteretic botdeneck. We now generahze the concept of 
bottienecks. 

Define a size k bottleneck in a network G to be a cycle cluster B 
with \V(B)\=k which, when removed from G, reduces the 
indegree of at least one node jeV(G), j^V{B) to less than half of 
its original indegree. Other than now using the set of nodes V(B) 
rather than a single node set, the above algorithm for finding the 
botdeneck control set remains unchanged. In Fig. 3, for instance, 
K(fi) = {4,5,6,7}, k = 4, L(5) = {4,5,6,7,8,9,10}, and I{B) = 1. 
With this more general definition, we note that controlliug any size 
k botdeneck B guarantees control of ail size 1 bottlenecks B' in the 
control set of B for all > 1 . 

For any botdeneck B of size > 1 in a network G, define the set 
of critical nodes, Z(B,G), as the set of nodes Z(B,G)^ of 
minimum cardinality that, when controlled, guarantees full control 
of all nodes ie V(B) after a transient period. Also define the critical 
number of nodes as ncnt(B,G) = \Z(B,G)\. Thus, for the network in 
Fig. 3, Z{B,G) = {4}, {6}, or {7}, and «crit(-8,G)= 1- 

In general, however, more than one node in a cycle cluster may 
need to be targeted to control the entire cycle cluster. Fig. 5 shows 
a cycle cluster (composed of nodes 2-10) that cannot be controlled 
by targeting any single node. The precise value of «crit for a given 
cycle cluster C depends on its topology as well as the edges 
connecting nodes from outside of C to the nodes inside of C, and 
finding Z(C,G) can be difficult. We present a theorem that puts 
bounds on ricrit to help determine whether a search for Z(C,G) is 
practical. 




Figure 2. A directed acyclic networlc. Controlling all three source 
nodes (nodes 1, 2 and 3) guarantees full control of the network, but are 
Ineffective when targeted individually. The best single node to control 
in this network is node 6 because it directly controls all downstream 
nodes. 

doi:l 0.1 371/journal.pone.01 05842.g002 

Theorem: Suppose a network G contains a cycle cluster C. 
Define the set of externally influenced nodes 

RiC,G) = {ieV(C) ■.jeViG\C),(j,i)eE(G)}, (14) 
the set of intruder connections 

WiC,G) = {lj,i)eE(G) : ieV(C)JeV(G\C)}, (15) 
and the reduced set of critical nodes 

Zr^(C,G) = Z(C,G\W). (16) 

If iV=|K(C)| and 

where deg~ {i) is computed ignoring intruder connections, then 
r^l<«crit(C,G)<C, (18) 

where 

C^minj^f^] +\R{C,G)\Zr^(C,G)\,N^ . (19) 

Proof: First, prove the lower limit of Eq. 18. Let C be a cycle 
cluster in a network G with R(C,G) = {0}. (A cycle cluster in a 
network with |i^(C,G)| >0 will have the same or higher activation 
barrier for any node in the cluster than the same cycle cluster in a 
network with J? = {0}. Since we are examining the lower limit of 
Eq. 18, we consider the case with the lowest activation barrier. 
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Figure 3. A network in wKiicKi nodes 4, 5, 6 and 7 compose a 
singie cycle cluster. The high connectivity of node 4 prevents any 
changes made to the spin of nodes 1-3 from propagating downstream. 
The only way to indirectly control nodes 8-10 is to target nodes inside 
of the cycle cluster. Targeting node 4, 6 or 7 will cause the entire cycle 
cluster to flip away from its initial state, guaranteeing control of nodes 
4-10 (see Fig. 4). 

dor:1 0.1 371 /journal.pone.01 05842.g003 

Any externally influenced nodes cause «crit to either increase or 
remain the same.) For any node / to be able to flip away fi-om the 
cancer state (although not necessarily remain there), we must have 
that hi= —a^l for a>0, meaning that at least half of the nodes 
upstream of i must point away from the cancer state. The node / 
requiring the smallest number of upstream nodes to be in the 
normal state is the node that satisfies deg^ (i) = fi. Controlling less 
than fi/2 nodes will leave all uncontrolled nodes with a field in the 
cancer direction, and no more flips will occur. Thus, 

«crit>rf1- (20) 

For the upper limit of Eq. 18, consider a complete clique on 
nodes, C = Kf/ (that is, Aij = \ for all i,JeV(Kf/), including self 
loops) in a network G. First, let there be no connections to any 
nodes in C from outside of C so that R{C,G) = {0}. For odd N, 
forcing {N+l)/2 nodes away from the cancer state will result in 
the field 

EM.'C^-'^y.^-e, (2.) 

for all nodes i. After one time step, all nodes will flip away from the 
cancer state. For even N, forcing N/2 nodes away from the cancer 
state will result in the field 

for all nodes i. At the next time step, the unfixed nodes wiU pick 
randomly between the normal and cancer state. If at least one of 
these nodes makes the transition away from the cancer state, the 



field at all other nodes will point away from the cancer direction. 
The system will then require one more time step to completely 
settle to (T, = — f^.. Thus, we have that for C = Kff in a network G 
with J?(C,G) = {0}, 

«crit(^JV,G)=[^^. (23) 

Kf^ with ff,(0) = (^J^ gives the largest activation barrier for any cycle 
cluster on N nodes with R(C,G) = {0} to switch away from the 
cancer attractor state. A general cycle cluster C with any topology 
on N nodes with R{C,G) = {0} in a network G will have 
deg~(i)<A^ for all nodes i, and so we have the upper bound 

n^t{C,G)<\^, (24) 

thus proving Eq. 18 for the special case ot R{C,G) = {0}. 

Now consider a cycle cluster C on nodes in a network G with 
\R{C,G)\>Q. Suppose all nodes in Zfed(C,G) are fixed away 
from the cancer state. By Eq. 24, |Zred(C,G)| < rA''/21. For any 
node is{R{C,G)r\Zjei{C,G)), (Ji{t^co)= —^'j is guaranteed 
because it has already been direcdy controlled. Any node 
ie(R(C,G)\Zi^^i(C,G)) has some incoming connections from 
nodes j^V{C), and these connections could increase the activation 
barrier enough such that fixing Z,cd(C,(j) is not enough to 
guarantee (7,(/->oo)= — i^^. To ensure that any node leV{C) 
points away from the cancer state, it is sufficient to fix all nodes 
ie(7?(C,G)\Zred(C,G)) as well as Zred(C,G) away from the cancer 
state. This increases «crit by at most \R{C,G)\Zj^{C,G)\, leaving 

«crit(C,G) < [y] + \R{C,G)\Z,^{C,G)\- (25) 

«crit can never exceed A'^, however, because directly controlling 
every node results in controlling C. We can thus say that 

«crit(C,G) < min ( [y] + \R{C,G)\Z„i{Cfi)\,N\ . (26) 



Finally, combining the upper limit in Eq. 26 with the lower limit 
from Eq. 20 gives Eq. 18. ■ 

There can be more than one Zj^d for a given cycle cluster. Note 
that the tightest constraints on «crit in Eq. 18 come from using the 
Zred with the largest overlap with R. If finding Zjed is too diflrcult, 
an overestimate for the upper hmit of «crit can be made by 
assuming that -Rn-Zred = {0} so that 

[f! <«crit(C,G)< min(^[^] +\RiC,G)\, ^) ■ (27) 

The cycle cluster in Fig. 5 has N = 9, R = {2,9}, 1, and one 
of the reduced sets of critical nodes is Zred = {9,10}, so 
l<«crit^6. It can be shown through an exhaustive search that 
for this network = 2, and the set of critical nodes is Z = {9, 10} 
(see Fig. 6). Here, Z = Zred, although this is not always the case. 
Because the cycle cluster has 9 nodes and 1 < «crit ^ 6, at most 




= 465 simulations are needed to find at least one 



solution for Z{C,G). However, the maximum number of 
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Figure 4. Cancer magnetization from targeting various nodes in the networl< shown in Fig. 3, averaged over 10,000 runs. The 

averaging removes fluctuations due to the random flipping of nodes with /;, = 0. Targeting node 7 results in the quickest stabilization, but targeting 
any one of nodes 4, 6 or 7 results in the same final magnetization. 
doi:1 0.1 371 /journal.pone.01 05842.g004 



simulations required to find Z(C,G) increases exponentially and 
for larger networks the problem quickly becomes intractable. 

One heuristic strategy for controlling cycle clusters is to look for 
size k' <\V(C)\ bottlenecks inside of C. Bottlenecks of size k»l 
and average indegree <(deg^(i?))«fc can contain high impact size 
k' botdenecks, where k' < k. Size k>l bottlenecks need to be 
compared to find the best set of nodes to target to reduce m'^^ . 
Simply comparing the impact is insufficent because a cycle cluster 
with a large impact could also have a large ncrit, requiring much 
more effort than its impact merits. Define the critical efficiency of a 
bottleneck B as 



fopt 



(B)-- 



max 

n=l,2,.. 



(29) 



ecnt(B)-- 



m 



(28) 



If the critical efficiency of a cycle cluster is much smaller than the 
impacts of size 1 bottlenecks from outside of the cycle cluster, the 
the cycle cluster can be safely ignored. 

For some cycle clusters, however, not all of the nodes need to be 
controlled in order for a large portion of the nodes in the cycle 
cluster's control set to flip. Define the optimal efficiency of a 
bottieneck B as 



where 5, S V{B) are size 1 bottlenecks and I{Bi)>I{Bj+i) for all 
/. Note that for any size 1 bottieneck B, eopi(B) = e^ntlB) = 1(B). 
This quantity thus allows bottlenecks with very different properties 
{1(B), nait(B,G), or to be ranked against each other. 

All strategies presented above are designed to select the best 
individual or small group of nodes to target. Significant changes in 
the biological networks' magnetization require targeting many 
nodes, however. Brute force searches on the effect of larger 
combinations of nodes are typically impossible because the 
required number of simulations scales exponentially with the 
number of nodes. A crude Monte Carlo search is also numerically 
expensive, since it is difficult to sample an appreciable portion of 
the available space. One alternative is to take advantage of the 
bottlenecks that can be easily found, and rank all size k>l 
bottlenecks Bj in an ordered list U such that 



U = (BuB2,Bi,...) 



(30) 



where 
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Figure 5. A network with a cycle cluster C, composed of nodes 
2-10, that cannot be controlled at 7 = 0 by controlling any 
single node. Here, the set of externally influenced nodes is 
«(C,G) = {2,9}, the set of intruder connections is W(C,G)^{(\,2), 
(1,9)}, the reduced set of critical nodes is Z,.cd(C,G) = {9,10}, the 
minimum indegree is /< = 1 and the number of nodes in the cycle cluster 
is A' = 9. By Eq. 1 8, this gives the bounds of the critical number of nodes 

to be 1 < «crit ^ 6. 

doi:1 0.1 371/journal.pone.01 05842.g005 



^opt 



(31) 



for all Bj,BjeU and fix the bottlenecks in the list in order. This is 
called the efficiency-ranked strategy. If all size k>\ bottienecks are 
ignored, it is called the pure efficiency-ranked strategy, and if size 



k>\ bottlenecks are included it is called the mixed efficiency- 
ranked strategy. 

An effective polynomial-time algorithm for fmding the top z 
nodes to fix, which we call the hest+1 strategy (equivalent to a 
greedy algorithm), works as follows: (1) Begin with a seed set of 
nodes to fix, F; (2) Test the effect of fixing F[Ji for all allowed 
nodes i^F; (3) i^<-i^U!best, where /best is the best node from all / 
sampled; (4) If <z, go to step (2). Otherwise, stop. The seed set 
of nodes could be the single highest impact size 1 bottleneck in the 
network, or it could be the best set of n nodes (where m < z) found 
from a brute force search. 

Cancer Signaling 

In application to biological systems, we assume that the 
magnetization of cell type a is related to the viability of cell type 
a, that is, the fraction of cells of type a that survives a drug 
treatment. It is reasonable to assume that the viability of cell type 
a, v"{m'^), is a monotonicaUy increasing function of m"^ . Because 
the exact relationship is not known, we analyze the effect of 
external perturbations in terms of the final magnetizations. 

We need to use as few controllers as possible to sufficiendy 
reduce m^, while leaving m"^ « + 1 . In practical applications, 
however, one is limited in the set of druggable targets. AU classes of 
drugs are constrained to act only on a specific set of biological 
components. For example, one class of drugs that is currendy 
under intense research is protein kinase inhibitors [35]. In this case 
one has two constraints: the only nodes that can be targeted are 
those that correspond to kinases, and they can only be inhibited, 
i.e. turned off We will use the example of kinase inhibitors to show 
how control is affected by such types of constraints. In the real 
systems studied, many differential nodes have only similarity nodes 
upstream and downstream of them, while the remaining 
differential nodes form one large cluster. This is not important 
for 77= I, but the effective edge deletion for ^ = 2 results in many 
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Figure 6. Magnetization for network from Fig. 5, averaged over 10,000 runs. There is no single node to target that will control the cycle 
cluster, but fixing nodes 9 and 10 results in full control of the cycle cluster, leaving only node 1 in the cancer state. This means Z{C,G) = {9,10} and 
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Table 2. General properties of the full networks. 



Properties 


Lung 


B cell 


Nodes 


9073 


4364 


Edges 


45635 


55144 


Sources 


129 


8 


Sinks 


8443 


1418 


Av. outdegree 


5.03 


12.64 


Max outdegree 


240 


2372 


Max indegree 


68 


196 


Self-loops 


238 


0 


Undirected edges 


350 


23386 


Diameter 


11 


11 


Max cycle cluster 


401 


2886 


Av. clustering coeff. [73] 


0.0544 


0.2315 



The network used for the analysis of lung cancer is a generic one obtained 
combining the data sets in Refs. [32] and [33]. The B cell network is a curated 
version of the B cell interactome obtained in Ref. [34] using a network 
reconstruction method and gene expression data from B cells. 
doi:l 0.1 371/journal.pone.Ol 05842.t002 



islets, which are nodes i with Ajj = Aji = Q for all i^j (self-loops 
allowed). Controlling islets requires targeting each islet individu- 
ally. For p = 2, we concentrate on controlling only the largest 
weakly connected differential subnetwork. AU final magnetizations 
are normalized by the total number of nodes in the fuU network, 
even if the simulations are only conducted on small portion of the 
network. 

The data fdes for all networks and attractors analyzed below can 
be found in Supporting Information. 

Lung Cell Network 

The network used to simulate lung cells was buUt by combining 
the kinase interactome from PliosphoPOINT [32] with the 
transcription factor interactome from TRANSFAC [33]. Both of 
these are general networks that were constructed by compiling 
many observed pairwise interactions between components, mean- 
ing that ify-*/, at least one of the proteins encoded by gene j has 
been directly observed interacting with gene i in experiments. This 
bottom-up approach means that some edges may be missing, but 
those present are reliable. Because of this, the network is sparse 
(~ 0.057% complete, see Table 2), resulting in the formation of 
many islets for p = 2. Note also that this network presents a clear 
hierarchical structure, characteristic of biological networks 
[36,37], with many "sink" nodes [38] that are targets of 
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Figure 7. Final cancer magnetizations for an unconstrained search on the lung cell network using p= 1. The efficiency-ranked strategy 
outperforms the relatively expensive Monte Carlo strategy. The best-Fl strategy works best, although it requires the largest computational time. Note 
that the mixed efficiency-ranked curve is not shown because it is identical to the pure efficiency-ranked curve. Key for magnetization curves: IVIC = 
Monte Carlo, B+^ = best-Fl, ERP= pure efficiency-ranked, ERM= mixed efficiency-ranked, EX= exhausive search. 
doi:1 0.1 371/journal.pone.01 05842.g007 
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Figure 8. Final cancer magnetizations for an unconstrained search on the lung cell network using p=2. As In the p = 1 case, the 
efficiency-ranked strategy outperforms the expensive Monte Carlo search. The plateaus In the efficiency-ranked strategy when fixing 9-10, 12-1 5, 20- 
21, etc. nodes are a result of targeting bottlenecks that are already indirectly controlled. 
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transcription factors and a relatively large cycle cluster originating 
from the kinase interactome. 

It is important to note that this is a non-specific network, 
whereas real gene regulatory networks can experience a sort of 
"rewiring" for a single cell type under various internal conditions 
[39]. In this analysis, we assume that the difference in topology 
between a normal and a cancer cell's regulatory network is 
negligible. The methods described here can be applied to more 
specialized networks for specific cell types and cancer types as 
these networks become more widely available. 

In our signaling model, the IMR-90 cell line [40,41] was used 
for the normal attractor state, and the two cancer attractor states 
examined were from the A549 (adenocarcinoma) [42^6] and 
NCI-H358 (bronchioalveolar carcinoma) [42,43] cell lines. Gene 
expression measurements from all referenced studies for a given 
cell line were averaged together to create a single attractor. The 
resulting magnetization curves for A549 and NCTH358 are very 
similar, so the following analysis addresses only A549. The fuU 
network contains 9073 nodes, but only 1175 of them are 
differential nodes in the IMR-90/A549 model. In the uncon- 
strained p=l case, all 1175 differential nodes are candidates for 
targeting. Exhaustively searching for the best pair of nodes to 
control requires investigating 689725 combinations simulated on 
the full network of 9073 nodes. However, 1094 of the 1 175 nodes 
are sinks (i.e. nodes i with outdegree deg+(0 = 0, ignoring self 



loops) and therefore have /(O = eopt(0= which can be safely 
ignored. The search space is thus reduced to 8 1 nodes, and fmding 
even the best triplet of nodes exhaustively is possible. Including 
constraints, only 3 1 nodes are differential kinases with = -f 1 . 
This reduces the search space at the cost of increasing the 
minimum achievable mi^,. 

There is one important cycle cluster in the fuU network, and it is 
composed of 401 nodes. This cycle cluster has an impact of 7948 
for p=l, giving a critical efficiency of at least ~19.8, and 
l<«crit^401 by Eq. 27. The optimal efficiency for this cycle 
cluster IS t?opt — 29, but this IS achieved for fixing the first bottleneck 
in the cluster. Additionally, this node is the highest impact size 1 
bottleneck in the full network, and so the mixed efficiency-ranked 
results are identical to the pure efficiency-ranked results for the 
unconstrained p = 1 lung network. The mixed efficiency-ranked 
strategy was thus ignored in this case. 

Fig. 7 shows the results for the unconstrained p = 1 model of the 
IMR-90/A549 lung cell network. (All simulations were performed 
using MATLAB on a desktop computer. Running the simulations 
to make all curves shown below required approximately 12 hours.) 
The unconstrained p=l system has the largest search space, so the 
Monte Carlo strategy performs poorly. The best-Hl strategy is the 
most effective strategy for controlling this network. The seed set of 
nodes used here was simply the size 1 bottleneck with the largest 
impact. Note that best-Hi works better than efFeciency-ranked. 
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This is because best+1 includes the synergistic effects of fixing 
multiple nodes, while efficiency-ranked assumes that there is no 
overlap between the set of nodes downstream from multiple 
bottlenecks. Importantiy, however, the efficiency-ranked method 

£ works nearly as well as best+1 and much better than Monte Carlo, 

I both of which are more computationally expensive than the 

1 efficiency-ranked strategy. 

g Fig. 8 shows the results for the unconstrained p = 2 model of the 

2 IMR-90/A549 lung cell network. The search space for ^ = 2 is 
^ much smaller than that for p=l. The largest weakly connected 
I, differential subnetwork contains only 506 nodes (see Table 3), and 
"g the remaining differential nodes are islets or are in subnetworks 
= composed of two nodes and are therefore unnecessary to consider. 

0 Of these 506 nodes, 450 are sinks. Fig. 9 shows the largest weakly 

1 connected component of the differential subnetwork, and the top 
g five bottlenecks in the unconstrained case are shown in red. If 
I, limiting the search to differential kinases with = -f 1 and 
^ ignoring all sinks, p = 2 has 19 possible targets. There is only one 
S cycle cluster in the largest differential subnetwork, containing 6 
3, nodes. Like the p=l case, the optimal efficiency occurs when 

I targeting the first node, which is the highest impact size 1 
.§ bottleneck. Because the mixed efiiciency-ranked strategy gives the 
£ same results as the pure efEciency-ranked strategy, only the pure 
^ strategy was examined. The Monte Carlo strategy fares better in 

3 the unconstrained p = 2 case because the search space is smaller. 
£ Additionally, the efiiciency-ranked strategy does worse against 

II the best+1 strategy for p = 2 than it did for /? = 1. This is because 
-rj the effective edge deletion decreases the average indegree of the 

c network and makes nodes easier to control indirectiy. When many 

— upstream botdenecks are controlled, some of the downstream 

y bottlenecks in the efiiciency-ranked list can be indirectiy 

Q controlled. Thus, controlling these nodes directiy results in no 

Q change in the magnetization. This gives the plateaus shown for 

= fixing nodes 9-10 and 12-15, for example. 

I The only case in which an exhaustive search is possible is for 
■S P = 2 with constraints, which is shown in Fig. 10. Note that the 

0 polynomial-time best+ 1 strategy identifies the same set of nodes as 
^ the exponential-time exhaustive search. This is not surprising, 

II however, since the constraints limit the available search space. 
^_ This means that the Monte Carlo also does well. The efficiency- 

1 ranked method performs worst. The efficiency-ranked strategy is 

0 designed to be a heuristic strategy that scales gently, however, and 
is not expected to work well in such a small space when compared 

H with more computationally expensive methods. 
II 

5 B Cell Network 

g The B cell network was derived from the B cell iiiteractome of 

-ii Ref [34]. The reconstruction method used in Ref [34] removes 

'fi edges from an initially complete network depending on pairwise 

^ gene expression correlation. Additionally, the original B cell 

^ network contains many protein-protein interactions (PPIs) as well 

1 as transcription factor-gene interactions (TFGIs). TFGIs have 
"5 g definite directionality: a transcription factor encoded by one gene 

g ^ affects the expression level of its target gene(s). PPIs, however, do 

^ m not have obvious directionality. We first filtered these PPIs by 

^ g checking if the genes encoding these proteins interacted according 

II 1 to the PhosphoPOINT/TRANSFAC network of the previous 

^ %■ section, and if so, kept the edge as directed. If the remaining PPIs 

I E are ignored, the results for the B cell are similar to those of the lung 
° :2, cell network. We found more interesting results when keeping the 
° ^ remaining PPIs as undirected, as is discussed below. 

5 5 Because of the network construction algorithm and the inclusion 

II ;5 of many undirected edges, the B cell network is more dense 
(~ 0.290% complete, see Table 2) than the lung cell network. This 
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Figure 9. Largest weakly connected differential subnetwork for IMR-90/A549 and p=2. Out of the 506 pictured nodes, 450 are sinks and 
therefore have an impact equal to one. The top five bottlenecks are labeled w\th their gene names and colored orange. 
doi:1 0.1 371 /journal.pone.01 05842.g009 



higher density leads to many more cycles than the lung cell 
network, and many of these cycles overlap to form one very large 
cycle cluster containing ~66% of nodes in the fuU network. AU 
gene expression data used for B cell attractors was taken from Ref 
[47]. We analyzed two types of normal B cells (naive and memory) 
and three types of B cell cancers (diffuse large B-ceU lymphoma 
(DLBCL), follicular lymphoma, and EBV-immortalized lympho- 
blastoma), giving six combinations in total. We present results for 
only the naive/DLBCL combination below, but Tables 3 and 4 
hst the properties of all normal/ cancer combinations. Again, all 
gene expression measurements for a given cell type were averaged 
together to produce a single attractor. The fuU B cell network is 
composed of 4364 nodes. For p=l, there is one cycle cluster C 



composed of 2886 nodes. This cycle cluster has l<Wcrit(C) 
< 1460, /(C) = 4353, and 3.0 <ecrit(C)< 4353. Finding Z(C) was 
deemed too difficult. 

Fig. 1 1 shows the results for the unconstrained p=l case. Again, 
the pure efficiency-ranked strategy gave the same results as the 
mixed efficiency-ranked strategy, so only the pure strategy was 
analyzed. As shown in Fig. 11, the Monte Carlo strategy is out- 
performed by both the efficiency-ranked and best+1 strategies. 
The synergistic effects of fixing multiple bottlenecks slowly 
becomes apparent as the best+1 and efficiency-ranked curves 
separate. 

Fig. 12 shows the results for the unconstrained p = 2 case. The 
largest weakly connected subnetwork contains one cycle cluster 
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Figure 10. Final cancer magnetizations for a constrained search on the lung cell networit using p=2. This Is the only case In which a 
limited exhaustive search Is possible. Interestingly, the exhaustive search locates the same nodes as the best+1 strategy for fixing up to eight nodes. 
The efficiency-ranked strategy performs poorly compared to the Monte Carlo strategy because the search space Is small and a large portion of the 
available space Is sampled by the Monte Carlo search. 
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with 351 nodes, with l<«crit^208. Although finding a set of 
critical nodes is difficult, the optimal efficiency for this cycle cluster 
is 62.2 for fixing 10 bottlenecks in the cycle cluster. This makes 
targeting the cycle cluster worthwhile. The efficiency of this set of 
1 0 nodes is larger than the efficiencies of the first 10 nodes from 
the pure efficiency-ranked strategy, so the rrf^^ from the mixed 
strategy drops earlier than the pure strategy. Both strategies 
quickly identify a small set of nodes capable of controlling a 
significant portion of the differential network, however, and the 
same result is obtained for fixing more than 10 nodes. The best+1 
strategy finds a smaller set of nodes that controls a similar fraction 
of the cycle cluster, and fixing more than 7 nodes results in only 
incremental decreases in rrf^^ . The Monte Carlo strategy performs 
poorly, never finding a set of nodes adequate to control a 
significant fraction of the nodes in the cycle cluster. 

Conclusions 

Signaling models for large and complex biological networks are 
becoming important tools for designing new therapeutic methods 
for complex diseases such as cancer. Even if our knowledge of 
biological networks is incomplete, rapid progress is currendy being 
made using reconstruction methods that use large amounts of 
publicly available omic data [12,13]. The Hopfield model we use 
in our approach allows mapping of gene expression patterns of 



normal and cancer cells into stored attractor states of the signaling 
dynamics in directed networks. The role of each node in disrupting 
the network signaling can therefore be explicidy analyzed to 
identify isolated genes or sets of strongly connected genes that are 
selective in their action. We have introduced the concept oisize k 
holllnecks to identify such genes. This concept led to the 
formulation of several heuristic strategies, such as the efficiency- 
ranked and hest+1 strategy to find nodes that reduce the overlap of 
the cell network with a cancer attractor. Using this approach, we 
have located small sets of nodes in lung and B cancer cells which, 
when forced away from their initial states with local magnetic 
fields (representing targeted drugs), disrupt the signaling of the 
cancer cells while leaving normal cells in their original state. For 
networks with few targetable nodes, exhaustive searches or Monte 
Carlo searches can locate effective sets of nodes. For larger 
networks, however, these strategies become too cumbersome and 
our heuristic strategies represent a feasible alternative. For tree-like 
networks, the pure efficiency-ranked strategy works well, whereas 
the mixed efficiency-ranked strategy could be a better choice for 
networks with high-impact cycle clusters. 

We make two important assumptions in applying this analysis to 
real biological systems. First, we assume that genes are either fuUy 
off or fully on, with no intermediate state. Modelling the state of a 
neuron as "all-or-none" has long been accepted as a reasonable 
assumption [48] , which provided the spin glass framework for the 
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Figure 11. Final cancer magnetizations for an unconstrained search on the B cell network using p = l. The Monte Carlo strategy is 
ineffective for fixing any number of nodes. The efficiency-ranl<ed and best+1 curves slowly separate because synergistic effects accumulate faster for 
best+1. 
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Hopfield model. While similar switch-like behavior in gene 
regulatory networks has been proposed as an explanation of, for 
example, segmentation in Drosophila embryos [49], assigning a 
Boolean value to gene expression may be overly simplistic in many 
cases. A model which uses spins with more than two projections 
could prove to be more reahstic and predictive. Second, we 
assume that all nodes update their status with a single timescale 
and with a single interaction strength. If the signaling timescale T,y 
for each edge in the biological network is known, simulations could 
be conducted in which a signal traveling along an edge (jj) 
reaches its target after T/, time steps. This would amount to a 
synchronous update schedule with a "queue" of signals moving 
between nodes. Likewise, our model gives equal weight to all edges 
(aside from edges that are effectively deleted in the p = 2 case), 
whereas real gene regulatory networks exhibit a spectrum of 
interaction strengths. This can easily be integrated with our model 
by using a weighted, directed adjacency matrix. However, doing 
this would surely require a change in control strategy. 

Despite these issues, our model shows promise. Some of the 
genes identified in Table 4 are consistent with current clinical and 
cancer biology knowledge. For instance, in the lung cancer list we 
found a well known tumor suppressor gene (TPS 3) [50] that is 
frequently mutated in many cancer types including lung cancer 
[51]. Mutations in PBXl have recendy been detected in non- 
small-cell lung cancer and this gene is now being considered as a 



target for therapy and prognosis [52]. MAP3K3 and MAP3K14 
are in the MAPK/ERK pathway which is a target of many novel 
therapeutic agents [53], and SRC is a well known oncogene and a 
candidate target in lung cancer [54]. BCL6 (B-ceU lymphoma 6) is 
the most common oncogene in DLBCL, and it is known that its 
expression can predict prognosis and response to drug therapy 
[55-57]. BCL6 is also frequendy mutated in follicular lymphoma 
[58,59]. Our analysis identified BCL6 as an important drug target 
for both DLBCL and follicular lymphoma using either naive or 
memory B-cells as a control for both p=l and p = 2. RBL2 
disregulation has been recently associated with many types of 
lymphoma [60-62]. FOXMl is a potential therapeutic target in 
mature B cell tumors [63] and ATF2 has been recently found to be 
highly disregulated in lymphoma [64,65]. Besides BCL6 discussed 
above, the N/D hst for DLBCL contains genes (MEF2A [66], 
NCOAl [67,68], TGIFl [69-71], NFATC3 [72]) that are all 
known to have a functional role in cancer, even if they have not 
been associated to the specific B-cell cancer types we have 
considered. Our predictions are for the immortalized cell lines we 
have selected, some of which are commonly used for in-vitro 
testing in many laboratories. RNAi and targeted drugs could then 
be used in these cell lines against the top scoring genes in Table 4 
to test the disruption of survival or proliferative capacity. If 
experimentally validated, our analysis based on attractor states and 
bottlenecks could be applied to patient-derived cancer cells by 
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Figure 12. Final cancer magnetizations for an unconstrained search on the B cell network using p=2. The rather sudden drop in the 
magnetization between controlling 5 and 10 nodes in the efficiency-ranked strategies comes from flipping a significant portion of a cycle cluster. This 
is the only network examined in which the mixed efficiency-ranked strategy produces results different from the pure efficiency-ranked strategy. 
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integrating in the model patient gene expression data to identify 
patient-specific targets. 

The above unconstrained searches assume that there exists 
some set of "miracle drugs" which can turn any gene "on" and 
"oflP' at will. This limitation can be patially taken into account by 
using constrained searches that limit the nodes that can be 
addressed. However, even the constrained search results are 
unrealistic, since most drugs directly target more than one gene. 
Inhibitors, for example, could target differential nodes with 
^'j = —l and ^" = + 1, which would damage only normal cells. 
Additionally, drugs would not be restricted to target only 
differential nodes, and certain combinations could be toxic to 
both normal and cancer cells. Few cancer treatments involve the 
use of a single drug, and the synergistic effects of combining 
multiple drugs adds yet another level of complication to finding an 
effective treatment [27]. On the other hand, the intrinsic 
nonlinearity of a cellular signaling network, with its inherent 
structure of attractor states, enhances control [31] so that a 
properly selected set of druggable targets might be sufficient for 
robust control. 

Supporting Information 

Table SI Lung cell network. The column labeled "Source 
EzID" contains the Entrez IDs of transcription factors and kinases. 



and "Target EzID" contains the Entrez IDs of the genes targeted 

by the transcription factor or kinase to its left. 

(TXT) 

Table S2 IMR-90/A549 attractors for lung cell network. 

The column labeled "EzID" contains the Entrez ID of the genes. 
The second and third columns are the normal and cancer 
attractor, respectively. 
(TXT) 

Table S3 IMR-90/NCI-H358 attractors for lung cell 
network. The column labeled "EzID" contains the Entrez ID 
of the genes. The second and third columns are the normal and 
cancer attractor, respectively. 
(TXT) 

Table S4 B cell network. The column labeled "Source EzID" 
contains the Entrez IDs of transcription factors and kinases, and 
"Target EzID" contains the Entrez IDs of the genes targeted by 
the transcription factor or kinase to its left. 
(TXT) 

Table S5 Memory/DLBCL attractors for B cell net- 
work. The column labeled "EzID" contains the Entrez ID of the 
genes. The second and third columns are the normal and cancer 
attractor, respectively. 
(TXT) 
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Table S6 Memory/EBV-immortalized lymphoblastoma 
attractors for B cell network. The column labeled "EzID" 
contains the Entrez ID of the genes. The second and third 
columns are the normal and cancer attractor, respectively. 
(TXT) 

Table S7 Memory/ follicular lymphoma attractors for B 
cell network. The column labeled "EzID" contains the Entrez 
ID of the genes. The second and third columns are the normal and 
cancer attractor, respectively. 

(TXT) 

Table S8 Naive/DLBCL attractors for B cell network. 

The column labeled "EzID" contains the Entrez ID of the genes. 
The second and third columns are the normal and cancer 
attractor, respectively. 
(TXT) 

Table S9 Naive/EBV-immortalized lymphoblastoma 
attractors for B cell network. The column labeled "EzID" 
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